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Computers are everywhere. 

Computers are used in_ 

and_applications. 


Computer systems ( 


and 


) are incredibly 


• With complexity comes a propensity 

for_. 

• Two approaches: 
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1.1 Fault Classification 


• Definitions 

- A fault (or failure) can be either a 

_or a_ 

- An_is a manifestation of the_. 

• Examples 

- Output of adder circuit_ 

- sin(x) computation really_ 

• Fault effects can_. 

• To limit this spread, designers 

incorporate_. 
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1.1 Fault Classification 


• These containment zones are 

_that reduce the chance 

that an effect can spread. 


• Hardware faults can be: 


• Hardware faults are or 
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1.2 Types of Redundancy 


All of fault tolerance is an exercise 

in_and_ 

_- the property of 

_than is 

minimally necessary. 

Four forms of redundancy:_ 


• Hardware redundancy is provided by 

_in the 

errors. 


design to _ 

- It can be _ 
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1.2 Types of Redundancy 


• The best-known form of_ 

redundancy,_and 

_coding, is widely used in 


_and_ 

codes are also used to protect data 

communicated over_(channels 

subject to many_failures) 

channels._upon detection of 

an error is_redundancy. 

_redundancy leads to 

hardware 


Electrical and Computer Engineering 


Page 6 of 10 


3 



















































UAH Chapter 1 CPE 633 

1.3 Basic Measures of Fault Tolerance 


• What does it mean to make machines more 

•> 

_ ■ 

- We need_ 

• Traditional Measures 

- _,_, is the probability that the 

system has been_in the time 

interval [0,t]. It is suitable for applications in 

which even a_can prove 

costly. 

• _(MTTF) 

• _(MTBF) 

• _(MTTR) 

• = + 
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1.3 Basic Measures of Fault Tolerance 


_,_, is the average_ 

over the interval [O.t] that the system is 

A = lim A(t) 

>oo 


_ MTTF _ MTTF 
~ MTBF ~ MTTF + MTTR 


_,_, is the probability 

that the system is up at_ 
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1.3 Basic Measures of Fault Tolerance 


• All this is nice as long as we know what_means. 

- Some cases are simple,_for example. 

- Other cases not so much, what if_ 

_ ■ 

- Many systems have_states 

• Extension of traditional measures to_ 

_of a system with n 

processors. 

ACC = £ cfiit) 

i =1 


• Cj is the_ 

_processors 

• Pj(t) is the probability that exactly 
operational at time t 
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1.3 Basic Measures of Fault Tolerance 


• Network Measures 

- Classical_and_- the minimum 

number of_and_that have to fail 

before the network becomes_. 

- Average_ 

- Maximum_(_) 


A 


Q- 


<3 




Network NI 


Network N2 
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